Lexical evolution rates by automated stability measure

نویسندگان

  • Filippo Petroni
  • Maurizio Serva
چکیده

Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with same meaning and averages on all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated to some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate then a smaller one since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists studying the stability of the different items. In this paper we tackle the problem with an automated methodology only based on our normalized Levenshtein distance. With this approach, the program of an automated reconstruction of languages relationships is completed. Lexical evolution rates by automated stability measure 2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Word Stability and Language Phylogeny

The idea of measuring distance between languages seems to have its roots in the work of the French explorer Dumont D’Urville (1832). He collected comparative word lists of various languages during his voyages aboard the Astrolabe from 1826 to1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relationship among languages. The metho...

متن کامل

Automated words stability and languages phylogeny

The idea of measuring distance between languages seems to have its roots in the work of the French explorer Dumont D'Urville (D'Urville 1832). He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relation among languages. Th...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

A system for automated lexical mapping.

OBJECTIVE To automate the mapping of disparate databases to standardized medical vocabularies. BACKGROUND Merging of clinical systems and medical databases, or aggregation of information from disparate databases, frequently requires a process whereby vocabularies are compared and similar concepts are mapped. DESIGN Using a normalization phase followed by a novel alignment stage inspired by ...

متن کامل

Lexical Tightness and Text Complexity

We present a computational notion of Lexical Tightness that measures global cohesion of content words in a text. Lexical tightness represents the degree to which a text tends to use words that are highly inter-associated in the language. We demonstrate the utility of this measure for estimating text complexity as measured by US school grade level designations of texts. Lexical tightness strongl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0912.0821  شماره 

صفحات  -

تاریخ انتشار 2009